Setup
# loading libraries
library(tidyverse)
library(here)
library(scales)
library(gt)# loading libraries
library(tidyverse)
library(here)
library(scales)
library(gt)# locating
here::i_am("analysis/Kish_HP2.qmd")
library(here)
# reading in data
# df was used instead of a longer name for the sake of my sanity in typing
df <- read.csv(here("data/Employee_Salaries.csv"))# Department
# Abbreviation/acronym for the department name. Example: ABS for the Alcohol Beverage Services department
# Department_Name
# the department name for all active, permanent employees of Montgomery County, MD (Maryland)
# Division
# The division for each department. That is, a single department can have multiple divisions and focuses.
# Example: ABS has the divisions that include those of ABS 85 Administrative Services
# and ABS 85 Beer Delivery Operations
# Gender
# Sex
# the Gender of M and F. Assuming that since this is public info, M is to be the legal sex of the individual M for Male and F for Female.
# Base_Salary
# precise base salaries (including down to the cent) for the individual, at the end of the calendar year.
# this is projected data. Should be accurate, but projection allows for greater flexibiliity.
# Overtime_Pay
# amount of overtime pay an employee/individual received for working beyond what they were required to do
# Longevity_Pay
# Longevity pay gives additional compensation for qualifying employees who have worked for a good deal of time. The longer time that is worked, the greater the longevity pay. Specifics about how long are dealt with in-house.
# Grade
# the standing an employee has within a company. The internal codes for all positions are not provided, but we can safely look at M1, M2, M3, etc. as those are clearly levels of management. There are thousands of grades.
# M4-Senior Manager (Not included in this particular data)
# M3-Middle Manager
# M2-Middle Manager
# M1-First Level ManagerWorkplace inequalities are a concern to many. We’ll be examining that topic — with the example of reliable and public data from Montgomery County, MD.
The employees in question are all active, permanent employees of Montgomery County, MD (Maryland).
Each employee works in a department, where each department can have multiple divisions.
In addition to one’s guaranteed base pay, an employee could earn overtime pay, longevity pay (when eligible for working at the company for a long time), both (additional compensation), either overtime or longevity pay, or neither.
These forms of pay (specifically, base pay and additional pay) sum to an employee’s total compensation.
Additional detail can be found above in “Explanation of Data Variables” tab.
We will begin by exploring the relationship between base pay and total compensation across all employees.
Then, we will look to see how this relationship develops across gender, management, and department.
Topic 1) How do base pay and additional pay relate to total compensation? is taking on a little more additional pay (at least if you can, like overtime) worth it?
Topic 2) Are there any disparities between the genders in light of the previous analysis?
Topic 3) Is management similar to overall jobs? Are Male and Female equally represented here? Is there evidence of a glass-ceiling, or it being harder for one sex to reach higher compensation due to real or imagined contraints?
Topic 4) How do the genders vary by department? Is it oversampling a gender so that we’re skewing our analyis?
First, let’s establish the data that we’ll be looking at
# Computing Total and Additional Compensation
df$Total_Compensation <- df$Base_Salary + df$Overtime_Pay + df$Longevity_Pay
df$Additional_Compensation <- df$Overtime_Pay + df$Longevity_Pay
# Reading Gender as Sex
df <- rename(df, Sex = "Gender")
# str(df)
# checking the datatypes for each variable
# df <- df |> mutate(across(where(is.double), as.integer))
# Double deemed more useful, can change to integer if further analysis needs integer type
# detecting NA Values
sum(is.na(df))Additional compensation will be the combination of overtime and longevity pay.
Total compensation will be the combination of base pay and additional compensation.
In the data, there are 0 missing values. (Good news!)
Assuming that since this data comes what is available to verified by the public,
M is assumed to be the legal sex of the individual — M for Male and F for Female. That is, sex as legality rather than gender as identity seems to be what is implied. Occasionally, the term “gender” might be used here as well to refer to “sex”.
#each row is a unique employee, for each dataset
# Total Count & cresting Employee number column
Total_Employees <- nrow(df)
df <- mutate(df, Employee = row_number())
# Base Count & Percentage
Employees_Without_Base <- filter(df, Base_Salary == 0.00)
Employees_With_Base <- anti_join(df, Employees_Without_Base)
Percent_With_Base <- (nrow(Employees_With_Base) / Total_Employees) * 100
# Over Count & Percentage
Employees_Without_Overtime <- filter(df, Overtime_Pay == 0.00)
Employees_With_Overtime <- anti_join(df, Employees_Without_Overtime)
Percent_Overtime <- (nrow(Employees_With_Overtime) / Total_Employees) * 100
# Long Count & Percentage
Employees_Without_Longevity <- filter(df, Longevity_Pay == 0.00)
Employees_With_Longevity <- anti_join(df, Employees_Without_Longevity)
Percent_Longevity <- (nrow(Employees_With_Longevity) / Total_Employees) * 100
# Either Over or Long Count & Percentage
Employees_Without_Over_Long <- filter(df, Overtime_Pay == 0.00 | Longevity_Pay == 0.00)
Employees_With_Over_Long <- anti_join(df, Employees_Without_Over_Long)
Percent_Over_Long <- (nrow(Employees_With_Over_Long) / Total_Employees) * 100
# Add Comp Count & Percentage
Employees_Without_Add_Comp <- filter(df, Additional_Compensation == 0.00)
Employees_With_Add_Comp <- anti_join(df, Employees_Without_Add_Comp)
Percent_Add_Comp <- (nrow(Employees_With_Add_Comp) / Total_Employees) * 100There are 10291 employees total, where 0 employees have no base salary (100% have a base salary)
Approximately 16.3% have overtime pay, 27.6% have longevity pay, and 16.3% have one or the either, and 66.1% have both.
Is overtime pay worth it? Does it have diminishing returns? If overtime pay (and other additional compensation) had diminishing returns, we would expect its slope to be less than 1. A slope of greater than 1 indicates that you get more and more total compensation for every bit more overtime and other additional pay you do. A higher slope indicates a greater degree of this scaling — which is desireable!
Base_Total_lm <- lm(Total_Compensation ~ Base_Salary,
data = filter(df))
Over_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary,
data = filter(Employees_With_Overtime))
Long_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary,
data = filter(Employees_With_Longevity))
Over_Long_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary,
data = filter(Employees_With_Over_Long))
Add_Comp_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary,
data = filter(Employees_With_Add_Comp))Based on the coefficients for the graphs, those who only get base pay have their total compensation scale up the least with those having overtime and longevity pay scaling up the most.
In particular, as is seen in the plots too, overtime pay scales slightly larger than longevity pay.
So, there does seem to be a non-zero difference between the slope of taking on overtime (and getting more pay) + total compensation and the slope of 1 (overtime pay is directly correlational with total compensation)
#overall
p <- df |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Employees") +
geom_smooth(method = "lm") +
theme(legend.position="none")+
theme(plot.title = element_text(size=10))
pThe slope coefficient (intercept and then slope) is 3269.0284279, 1.070269.
p2 <- Employees_With_Overtime |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Those with Overtime Pay") +
geom_smooth(method = "lm") +
theme(legend.position="none")+
theme(plot.title = element_text(size=10))
p2The slope coefficient (intercept and then slope) is -1.1910607^{4}, 1.3303052.
p3 <- Employees_With_Longevity |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Those with Longevity Pay") +
geom_smooth(method = "lm") +
theme(legend.position="none")+
theme(plot.title = element_text(size=10))
p3The slope coefficient (intercept and then slope) is -1.1115563^{4}, 1.2845143.
p4 <- Employees_With_Over_Long |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Those with Overtime or Longevity Pay") +
geom_smooth(method = "lm") +
theme(legend.position="none")+
theme(plot.title = element_text(size=10))
p4The slope coefficient (intercept and then slope) is -3.2010085^{4}, 1.5664147.
p5 <- Employees_With_Add_Comp |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Those with Additional Compensation") +
geom_smooth(method = "lm")+
theme(legend.position="none")+
theme(plot.title = element_text(size=10))
p5The slope coefficient (intercept and then slope) is -4803.6504064, 1.2194241.
# addiing to the original df a new column, using mutate, that is the percent of total
# compensation that is each of the relevant factors.
df <- group_by(df, Employee) |>
mutate(Percent_Over = Overtime_Pay/Total_Compensation)
df$Percent_Over <- round(df$Percent_Over, 3) * 100
df <- group_by(df, Employee) |>
mutate(Percent_Long = Longevity_Pay/Total_Compensation)
df$Percent_Long <- round(df$Percent_Long, 3) * 100
df <- group_by(df, Employee) |>
mutate(Percent_Base = Base_Salary/Total_Compensation)
df$Percent_Base <- round(df$Percent_Base, 3) * 100
df <- group_by(df, Employee) |>
mutate(Percent_Add_Comp = Additional_Compensation/Total_Compensation)
df$Percent_Add_Comp <- round(df$Percent_Add_Comp, 3) * 100
df <- group_by(df, Employee) |>
mutate(Percent_Over_Long = Base_Salary/Total_Compensation)
df$Percent_Over_Long <- round(df$Percent_Over_Long, 3) * 100Percent_Base_Total_lm <- lm(Total_Compensation ~ Percent_Base,
data = filter(df))
Percent_Over_Total_lm <- lm(Total_Compensation ~ Percent_Over,
data = filter(df))
Percent_Long_Total_lm <- lm(Total_Compensation ~ Percent_Long,
data = filter(df))
Percent_Over_Long_Total_lm <- lm(Total_Compensation ~ Percent_Over_Long,
data = filter(df))
Percent_Add_Comp_Total_lm <- lm(Total_Compensation ~ Percent_Add_Comp,
data = filter(df))In terms of these slope coefficients, longevity pay has the greatest scaling. Interestingly, the more percentage base pay you have, the less your total compensation! (negative slope!)
p6 <- df |>
group_by(Employee) |>
ggplot(mapping = aes(x = Percent_Base, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Percent Base Pay", y = "Total Compensation", title = "Percent Base Salary, Total Compensation for All Employees") +
geom_smooth(method = "lm") +
theme(legend.position="none")+
theme(plot.title = element_text(size=10))
p6The slope coefficient (intercept and then slope) is 2.4535346^{5}, -1576.4445085.
p10 <- df |>
group_by(Employee) |>
ggplot(mapping = aes(x = Percent_Over, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Percent Overtime Pay", y = "Total Compensation", title = "Percent Overtime, Total Compensation for All Employees") +
geom_smooth(method = "lm") +
theme(legend.position="none") +
theme(legend.position="none")+
theme(plot.title = element_text(size=10))
p10The slope coefficient (intercept and then slope) is 9.0650397^{4}, 1428.4602409.
p7 <- df |>
group_by(Employee) |>
ggplot(mapping = aes(x = Percent_Long, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Percent Longevity Pay", y = "Total Compensation", title = "Percent Longevity Pay, Total Compensation for All Employees") +
geom_smooth(method = "lm") +
theme(legend.position="none")+
theme(plot.title = element_text(size=10))
p7The slope coefficient (intercept and then slope) is 9.406297^{4}, 4668.7561887.
p8 <- df |>
group_by(Employee) |>
ggplot(mapping = aes(x = Percent_Over_Long, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Percent Overtime or Longevity Pay", y = "Total Compensation", title = "Percent Either Overtime or Longevity Pay, Total Compensation for All Employees") +
geom_smooth(method = "lm") +
theme(legend.position="none")+
theme(plot.title = element_text(size=10))
p8The slope coefficient (intercept and then slope) is 2.4535346^{5}, -1576.4445085.
p9 <- df |>
group_by(Employee) |>
ggplot(mapping = aes(x = Percent_Add_Comp, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Percent Additional Compensation", y = "Total Compensation", title = "Percent Additional Compensation, Total Compensation for All Employees") +
geom_smooth(method = "lm") +
theme(legend.position="none") +
theme(plot.title = element_text(size=10))
p9The slope coefficient (intercept and then slope) is 8.7709012^{4}, 1576.4445085.
Again, in terms of these slope coefficients, longevity pay has the greatest scaling. Perhaps there are other things going on in that are affecting this (higher skill = higher pay?)
Gender is one of the most cited inequalities anywhere, let alone the workplace.
We will be going through our previous analysis, but this time separate Male and Female to see if their scalings differ.
The sex with the higher (positive) scaling benefits more from overtime, longevity pay, etc.
This is a simplified analysis that does not control for how much (if at all) each sex accepts these additional forms of compensation (e.g. overtime) — it is at a higher level. That is, it is looking at the resulting inequalities between males and females regardless of how those inequalities came to be.
# Base effect on total compensation, by sex
M_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary,
data = filter(df, Sex =="M"))
F_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary,
data = filter(df, Sex == "F"))
# Over
M_Over_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary,
data = filter(Employees_With_Overtime, Sex =="M"))
F_Over_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary,
data = filter(Employees_With_Overtime, Sex == "F"))
# Long
M_Long_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary,
data = filter(Employees_With_Longevity, Sex =="M"))
F_Long_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary,
data = filter(Employees_With_Longevity, Sex == "F"))
# Over Long
M_Over_Long_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary,
data = filter(Employees_With_Over_Long, Sex =="M"))
F_Over_Long_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary,
data = filter(Employees_With_Over_Long, Sex == "F"))
#Add Comp
M_Add_Comp_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary,
data = filter(Employees_With_Add_Comp, Sex =="M"))
F_Add_Comp_Base_Total_lm <- lm(Total_Compensation ~ Base_Salary,
data = filter(Employees_With_Add_Comp, Sex == "F"))NOTE: ignored in analysis, deemed insignificant to the question of gender
# Percent Base
Gender_Percentage_M_lm <- lm(Total_Compensation ~ Percent_Base,
data = filter(df,Sex =="M"))
Gender_Percentage_F_lm <- lm(Total_Compensation ~ Percent_Base,
data = filter(df, Sex =="F"))
# Percent Over
Gender_Percentage_Over_M_lm <- lm(Total_Compensation ~ Percent_Over,
data = filter(df, Sex =="M"))
Gender_Percentage_Over_F_lm <- lm(Total_Compensation ~ Percent_Over,
data = filter(df, Sex =="F"))
# Percent Long
Gender_Percentage_Long_M_lm <- lm(Total_Compensation ~ Percent_Long,
data = filter(df, Sex =="M"))
Gender_Percentage_Long_F_lm <- lm(Total_Compensation ~ Percent_Long,
data = filter(df, Sex =="F"))
# Percent Over Long
Gender_Percentage_Over_Long_M_lm <- lm(Total_Compensation ~ Percent_Over_Long,
data = filter(df, Sex =="M"))
Gender_Percentage_Over_Long_F_lm <- lm(Total_Compensation ~ Percent_Over_Long,
data = filter(df, Sex =="F"))
# Percent Add Comp
Gender_Percentage_Add_Comp_M_lm <- lm(Total_Compensation ~ Percent_Add_Comp,
data = filter(df, Sex =="M"))
Gender_Percentage_Add_Comp_F_lm <- lm(Total_Compensation ~ Percent_Add_Comp,
data = filter(df, Sex =="F"))#overall
p1 <- df |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for All Employees, by Gender") +
geom_smooth(method = "lm") +
facet_wrap(~Sex)+
theme(legend.position="none") +
theme(plot.title = element_text(size=10))
p1The slope coefficient (intercept and then slope) for males is 5255.617831, 1.0914173 and for females it is 2325.7268239, 1.0198384.
p2 <- Employees_With_Overtime |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for Those with Overtime, by Gender") +
geom_smooth(method = "lm") +
facet_wrap(~Sex) +
theme(legend.position="none") +
theme(plot.title = element_text(size=10))
p2The slope coefficient (intercept and then slope) for males is -1.2171035^{4}, 1.3550508 and for females it is-5720.6699864, 1.1905479.
p3 <- Employees_With_Longevity |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for Those with Longevity Pay, by Gender") +
geom_smooth(method = "lm") +
facet_wrap(~Sex) +
theme(legend.position="none") +
theme(plot.title = element_text(size=10))
p3The slope coefficient (intercept and then slope) for males is -1.0675483^{4}, 1.3218743 and for females it is -2616.177079, 1.1207052.
p4 <- Employees_With_Over_Long |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for Those with Overtime or Longevity, by Gender") +
geom_smooth(method = "lm") +
facet_wrap(~Sex) +
theme(legend.position="none") +
theme(plot.title = element_text(size=10))
p4The slope coefficient (intercept and then slope) for males is -3.2236932^{4}, 1.5862565 and for females it is -1.7913256^{4}, 1.3598861.
p5 <- Employees_With_Add_Comp |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Base Salary vs. Total Compensatory for Those with Additional Compensation, by Gender") +
geom_smooth(method = "lm") +
facet_wrap(~Sex) +
theme(legend.position="none") +
theme(plot.title = element_text(size=10))
p5The slope coefficient (intercept and then slope) for males is -5492.0835418, 1.2586281 and for females it is -261.9011481, 1.0971166.
It looks like males are consistently getting higher slopes than their counterparts.
The data provided was not too clear on the types of jobs each employee had.
However, I was able to pick out those jobs dealing with management.
Do these jobs follow the trends we saw earlier? Both in terms of additional compensation scaling and men benefiting from this more?
Due to limitations in the data (the spread of the managerial data does not allow for proper slope analysis) we will be relying on visuals and summaries.
Management was broken down into categories: where M4-Senior Manager (Not included in this particular data)
M3-Middle Manager
M2-Middle Manager
M1-First Level Manager
#filter
Management_1 <- filter(df, Grade == "M1")
Management_2 <- filter(df, Grade == "M2")
Management_3 <- filter(df, Grade == "M3")
#combine
Management_df <- bind_rows(Management_1, Management_2, Management_3)# all
Row_Management_df <- Management_df |> nrow()
# M F distinction
Row_M_Management_df <- Management_df |> filter(Sex =="M") |> nrow()
Row_F_Management_df <- Management_df |> filter(Sex =="F") |> nrow()
# Male Managerial distinction
Row_M_M1_Management_df <- Management_df |> filter(Grade == "M1") |> filter(Sex =="M") |> nrow()
Row_M_M2_Management_df <- Management_df |> filter(Grade == "M2") |> filter(Sex =="M") |> nrow()
Row_M_M3_Management_df <- Management_df |> filter(Grade == "M2") |> filter(Sex =="M") |> nrow()
# Female Managerial distinction
Row_F_M1_Management_df <- Management_df |> filter(Grade == "M1") |> filter(Sex =="F") |> nrow()
Row_F_M2_Management_df <- Management_df |> filter(Grade == "M2") |> filter(Sex =="F") |> nrow()
Row_F_M3_Management_df <- Management_df |> filter(Grade == "M2") |> filter(Sex =="F") |> nrow()There are 446 employees in management. 214 of those are male and 232 of those are female.
Between the management levels, males and females seem to be equally represented.
In M1, there are 17 males and 11 females.
In M2, there are 73 males and 74 females.
In M3, there are 73 males and 74 females.
Looks pretty comparable to me
p <- Management_df |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Management Base Salary vs. Total Compensatory for All Employees, by Job Grade") +
geom_smooth(method = "lm") +
facet_wrap(~ Grade) +
theme(legend.position="none") +
theme(plot.title = element_text(size=10))
pThe slopes look pretty similar…
p <- Management_df |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Management Base Salary vs. Total Compensatory for All Employees, by Gender") +
geom_smooth(method = "lm") +
facet_wrap(~ Sex) +
theme(legend.position="none") +
theme(plot.title = element_text(size=10))
pThe slopes look pretty even across the sexes too, what if we combined them?
p2 <- Management_df |>
ggplot(mapping = aes(x = Base_Salary, y = Total_Compensation)) +
geom_point(aes(color = "red")) +
scale_x_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Management Base Salary vs. Total Compensatory for All Employees, by Both Job Grade and Gender") +
geom_smooth(method = "lm") +
facet_grid(rows = vars(Grade), cols = vars(Sex)) +
theme(legend.position="none") +
theme(plot.title = element_text(size=10))
p2Yeah, the slopes look pretty similar across the charts! Remember, there is not a lot of management specific data, so some lines might appear to have more or less extension than they otherwise would.
Let’s look at the the management data to see some specific numbers that wouldn’t show up in a graph
First, let’s see how employees with management jobs compare to employees at large — to see if we picked an okay sample.
Management_df |>
group_by(Sex) |>
summarize(Mean = mean(Base_Salary)) |> gt()|>
opt_row_striping() |>
opt_stylize(style = 4, color = "red")| Sex | Mean |
|---|---|
| F | 150932.1 |
| M | 155994.6 |
df |>
group_by(Sex) |>
summarize(Mean = mean(Base_Salary)) |> gt()|>
opt_row_striping() |>
opt_stylize(style = 4, color = "red")| Sex | Mean |
|---|---|
| F | 87497.50 |
| M | 92382.93 |
Well, to maybe not too much suprise, the mean base salaries for management are much higher than average.
Management_df |>
group_by(Sex) |>
summarize(mean = mean(Total_Compensation)) |> gt()|>
opt_row_striping() |>
opt_stylize(style = 4, color = "red")| Sex | mean |
|---|---|
| F | 150968.2 |
| M | 157011.8 |
df |>
group_by(Sex) |>
summarize(mean = mean(Total_Compensation)) |> gt()|>
opt_row_striping() |>
opt_stylize(style = 4, color = "red")| Sex | mean |
|---|---|
| F | 91559.04 |
| M | 106083.94 |
Interestingly too, the mean total compensation salaries for management are much higher than average. Also, it is important to note here that there is little additional compensation above base salary for management, so perhaps this wasn’t the most ideal choice!
head(Management_df |>
filter(Sex == "M") |>
summarize(Total_Compensation) |>
arrange(desc(Total_Compensation)), 100) |> gt()|>
opt_row_striping() |>
opt_stylize(style = 4, color = "red")| Employee | Total_Compensation |
|---|---|
| 1323 | 256993.0 |
| 591 | 196059.0 |
| 1679 | 196059.0 |
| 1681 | 196059.0 |
| 2060 | 196059.0 |
| 3424 | 196059.0 |
| 7555 | 196059.0 |
| 7569 | 196059.0 |
| 10216 | 196059.0 |
| 10257 | 196059.0 |
| 7799 | 196040.7 |
| 7117 | 195794.5 |
| 501 | 193460.3 |
| 5669 | 193460.3 |
| 2923 | 191726.0 |
| 2017 | 190784.7 |
| 2924 | 190000.0 |
| 3252 | 183625.7 |
| 469 | 176948.0 |
| 1 | 175873.0 |
| 515 | 175873.0 |
| 529 | 175873.0 |
| 572 | 175873.0 |
| 578 | 175873.0 |
| 846 | 175873.0 |
| 1448 | 175873.0 |
| 1458 | 175873.0 |
| 1517 | 175873.0 |
| 1587 | 175873.0 |
| 1618 | 175873.0 |
| 1694 | 175873.0 |
| 1837 | 175873.0 |
| 2053 | 175873.0 |
| 2061 | 175873.0 |
| 2118 | 175873.0 |
| 2296 | 175873.0 |
| 2363 | 175873.0 |
| 3246 | 175873.0 |
| 3445 | 175873.0 |
| 3505 | 175873.0 |
| 3585 | 175873.0 |
| 3587 | 175873.0 |
| 3590 | 175873.0 |
| 3600 | 175873.0 |
| 3632 | 175873.0 |
| 3877 | 175873.0 |
| 5309 | 175873.0 |
| 5668 | 175873.0 |
| 5735 | 175873.0 |
| 7573 | 175873.0 |
| 7760 | 175873.0 |
| 7776 | 175873.0 |
| 7856 | 175873.0 |
| 7859 | 175873.0 |
| 9684 | 175873.0 |
| 10155 | 175873.0 |
| 10215 | 175873.0 |
| 10224 | 175873.0 |
| 10268 | 175872.4 |
| 364 | 175861.2 |
| 3568 | 175409.9 |
| 6413 | 175273.1 |
| 7594 | 175099.4 |
| 844 | 174796.6 |
| 1688 | 174396.4 |
| 1682 | 174093.2 |
| 7802 | 172457.9 |
| 7814 | 170084.7 |
| 3608 | 169873.5 |
| 7777 | 168747.6 |
| 5147 | 168258.1 |
| 2062 | 168234.0 |
| 6277 | 167350.4 |
| 9030 | 165543.8 |
| 6165 | 158075.4 |
| 437 | 157272.3 |
| 919 | 155976.0 |
| 3910 | 154769.8 |
| 1360 | 154508.5 |
| 2348 | 154277.8 |
| 7870 | 154201.7 |
| 7682 | 153930.2 |
| 239 | 152940.0 |
| 1412 | 152940.0 |
| 1520 | 152940.0 |
| 1593 | 152940.0 |
| 1612 | 152940.0 |
| 1619 | 152940.0 |
| 1635 | 152940.0 |
| 1764 | 152940.0 |
| 1798 | 152940.0 |
| 2021 | 152940.0 |
| 2069 | 152940.0 |
| 2077 | 152940.0 |
| 2298 | 152940.0 |
| 2318 | 152940.0 |
| 2323 | 152940.0 |
| 2371 | 152940.0 |
| 3247 | 152940.0 |
| 3266 | 152940.0 |
head(Management_df |>
filter(Sex == "F") |>
summarize(Total_Compensation) |>
arrange(desc(Total_Compensation)), 100) |> gt()|>
opt_row_striping() |>
opt_stylize(style = 4, color = "red")| Employee | Total_Compensation |
|---|---|
| 859 | 196059.0 |
| 1389 | 196059.0 |
| 3584 | 196059.0 |
| 7122 | 196059.0 |
| 7818 | 196059.0 |
| 7558 | 194520.4 |
| 5680 | 188679.9 |
| 7752 | 185609.7 |
| 722 | 180704.1 |
| 3599 | 179833.0 |
| 606 | 179259.9 |
| 536 | 178510.6 |
| 521 | 177064.1 |
| 537 | 175873.0 |
| 555 | 175873.0 |
| 584 | 175873.0 |
| 593 | 175873.0 |
| 1308 | 175873.0 |
| 1557 | 175873.0 |
| 1669 | 175873.0 |
| 1680 | 175873.0 |
| 1996 | 175873.0 |
| 2067 | 175873.0 |
| 2775 | 175873.0 |
| 3458 | 175873.0 |
| 3586 | 175873.0 |
| 3595 | 175873.0 |
| 3598 | 175873.0 |
| 3648 | 175873.0 |
| 5156 | 175873.0 |
| 5158 | 175873.0 |
| 5315 | 175873.0 |
| 5488 | 175873.0 |
| 5788 | 175873.0 |
| 6065 | 175873.0 |
| 6086 | 175873.0 |
| 7036 | 175873.0 |
| 7687 | 175873.0 |
| 7779 | 175873.0 |
| 8864 | 175873.0 |
| 9861 | 175873.0 |
| 10129 | 175873.0 |
| 10162 | 175873.0 |
| 10206 | 175873.0 |
| 10174 | 175867.1 |
| 10159 | 175259.7 |
| 604 | 174876.5 |
| 6275 | 174165.9 |
| 6308 | 174165.9 |
| 834 | 174000.0 |
| 3421 | 172415.2 |
| 608 | 172026.6 |
| 7797 | 171540.0 |
| 10212 | 171540.0 |
| 9712 | 171520.4 |
| 9859 | 170930.7 |
| 6064 | 170203.8 |
| 824 | 169159.0 |
| 8977 | 169114.8 |
| 7745 | 168128.4 |
| 6066 | 167791.7 |
| 7815 | 167314.0 |
| 5802 | 167227.2 |
| 115 | 166140.0 |
| 9721 | 165998.0 |
| 5164 | 164333.0 |
| 8697 | 158007.4 |
| 5163 | 157550.0 |
| 5507 | 156050.5 |
| 7559 | 155747.5 |
| 1670 | 153000.7 |
| 465 | 152940.0 |
| 471 | 152940.0 |
| 504 | 152940.0 |
| 579 | 152940.0 |
| 1306 | 152940.0 |
| 1433 | 152940.0 |
| 1477 | 152940.0 |
| 1547 | 152940.0 |
| 1566 | 152940.0 |
| 1609 | 152940.0 |
| 1961 | 152940.0 |
| 2008 | 152940.0 |
| 2036 | 152940.0 |
| 2066 | 152940.0 |
| 2364 | 152940.0 |
| 3387 | 152940.0 |
| 3577 | 152940.0 |
| 3589 | 152940.0 |
| 3645 | 152940.0 |
| 3893 | 152940.0 |
| 4066 | 152940.0 |
| 5761 | 152940.0 |
| 6483 | 152940.0 |
| 6498 | 152940.0 |
| 6574 | 152940.0 |
| 6835 | 152940.0 |
| 7664 | 152940.0 |
| 7698 | 152940.0 |
| 8973 | 152940.0 |
Sorry for the scrolling, but in terms of the “glass-ceiling” — it seems like the top earners are males, with female compensation dropping off more quickly than male compensation
It is also interesting to note that males had higher mean base salary and mean total compensation for both management and the overall dataset.
Further analysis can decipher what the other “Grade” job codes are to see if this trend continues
Was our data consistent among all departments? Are one or a few departments possible skewing the data (and the above discussed scaling?)
p2 <- df |>
ggplot(mapping = aes(x = Department, y = Total_Compensation)) +
geom_point(aes(color = Sex, shape = Sex)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Department", y = "Total Compensation", title = "Total Compensatory for All Employees, by Department and Sex") +
theme(plot.title = element_text(size=10)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
p2It seems like M and F are pretty okay in terms of the spread. FRS and POL looks male dominated and HHS looks female dominated.
| Department | Department_Name |
|---|---|
| HHS | Department of Health and Human Services |
| Department | Department_Name |
|---|---|
| FRS | Fire and Rescue Services |
| Department | Department_Name |
|---|---|
| POL | Department of Police |
p2 <- df |>
ggplot(mapping = aes(x = Department, y = Total_Compensation)) +
geom_point(aes(color = Percent_Base, size = Sex)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Percent Base Salary vs. Total Compensatory for All Employees, by Department and Sex") + theme(plot.title = element_text(size= 10)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
p2COR, FRS, and POL stick out as having higher total compensation, and this compensation (shown in dark blue) becomes higher at the point where it no longer is because of their base salary as much.
p2 <- df |>
ggplot(mapping = aes(x = Department, y = Total_Compensation)) +
geom_point(aes(color = Sex, size = Percent_Add_Comp)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Percentage Additional Compensation vs. Total Compensatory for All Employees, by Department and Sex") + theme(plot.title = element_text(size=9)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
p2That is, same as above, you can see COR, FRS, and POL start to have total compensation right where they start to get less base pay as a percentage of their total pay.
p2 <- df |>
ggplot(mapping = aes(x = Department, y = Total_Compensation)) +
geom_point(aes(color = Percent_Base, shape = Sex, size = Base_Salary)) +
scale_size_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
scale_y_continuous(labels = label_number(suffix = " K", scale = 1e-3)) +
labs(x = "Base Salary", y = "Total Compensation", title = "Percent Base Salary vs. Total Compensatory for All Employees, by Department and Sex") + theme(plot.title = element_text(size=10)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
p2Here’s an interesting visual showing the same information as the two previous — but here again we still se COR, FRS, and POL sticking out.
Those three lines with lower percent bases and higher total comp are COR FRS and POL.
# This was interesting code to write
# filtered the entire df to show only COR entries, headed that data frame to one entry (row)
# ungrouped that entry by employee so I could just select columns department and department name
# then make it into a table
# COR
head(filter(df, Department == "COR"), 1) |> ungroup(Employee) |>
select(1:2) |> gt()|>
opt_row_striping() |>
opt_stylize(style = 4, color = "red")| Department | Department_Name |
|---|---|
| COR | Correction and Rehabilitation |
#2 FRS
head(filter(df, Department == "FRS"), 1) |> ungroup(Employee) |>
select(1:2)|> gt()|>
opt_row_striping() |>
opt_stylize(style = 4, color = "red")| Department | Department_Name |
|---|---|
| FRS | Fire and Rescue Services |
# POL
head(filter(df, Department == "POL"), 1) |> ungroup(Employee) |>
select(1:2) |> gt()|>
opt_row_striping() |>
opt_stylize(style = 4, color = "red")| Department | Department_Name |
|---|---|
| POL | Department of Police |
Based on what the three departments are, as shown in the tables above, do you have any idea why they would stick out? (I don’t have an hypothesis, personally)
We saw a lack of evidence for there being diminishing returns for getting additional compensation
Moreover, males tend to experience less diminishing returns than their counterparts.
The gender of participants seems to be pretty consistent across all departments, and three departments stick out for how they compensate their employees.
Management is not too much like the entire data in terms of mean base salary and mean total compensation, but the slopes of the lines do not look too radical